Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
As demand grows for job-ready data science professionals, there is increasing recognition that traditional training often falls short in cultivating the higher-order reasoning and real-world problem-solving skills essential to the field. A foundational step toward addressing this gap is the identification and organization of knowledge components (KCs) that underlie data science problem solving (DSPS). KCs represent conditional knowledge—knowing about appropriate actions given particular contexts or conditions—and correspond to the critical decisions data scientists must make throughout the problem-solving process. While existing taxonomies in data science education support curriculum development, they often lack the granularity and focus needed to support the assessment and development of DSPS skills. In this paper, we present a novel framework that combines the strengths of large language models (LLMs) and human expertise to identify, define, and organize KCs specific to DSPS. We treat LLMs as ``knowledge engineering assistants" capable of generating candidate KCs by drawing on their extensive training data, which includes a vast amount of domain knowledge and diverse sets of real-world DSPS cases. Our process involves prompting multiple LLMs to generate decision points, synthesizing and refining KC definitions across models, and using sentence-embedding models to infer the underlying structure of the resulting taxonomy. Human experts then review and iteratively refine the taxonomy to ensure validity. This human-AI collaborative workflow offers a scalable and efficient proof-of-concept for LLM-assisted knowledge engineering. The resulting KC taxonomy lays the groundwork for developing fine-grained assessment tools and adaptive learning systems that support deliberate practice in DSPS. Furthermore, the framework illustrates the potential of LLMs not just as content generators but as partners in structuring domain knowledge to inform instructional design. Future work will involve extending the framework by generating a directed graph of KCs based on their input-output dependencies and validating the taxonomy through expert consensus and learner studies. This approach contributes to both the practical advancement of DSPS coaching in data science education and the broader methodological toolkit for AI-supported knowledge engineering.more » « less
-
As demand for data scientists has increased to inform decision-making across multiple fields of societal importance, postsecondary institutions have expanded data science course offerings. Despite such growth, educators struggle to teach students all the skills central to data science. They focus on programming and statistical tools and lack time for mentoring students in data storytelling. This working paper reviewed literature and interviewed experts to model the domain knowledge of data storytelling to inform the design of intelligent technology to support data storytelling instruction at scale. The paper closes with a recommendation of two ways that artificial intelligence tools can support the development of students’ data storytelling knowledge and skills: "direct" feedback to students on routine data science tasks and "facilitated" summaries of students' data story progress to inform instructors' feedback. We intend to apply these insights to the design of intelligent coaching in an online platform to support the development of storytelling competency at scale.more » « less
-
Data storytelling is the skill to communicate data effectively and efficiently. Effective data storytelling goes beyond data visualization and focuses on explanation with clear rhetorical functions. It starts with a set of data insights collected from the data science workflow and involves iterative and interactive processes of filtering those insights into story slices, from which data stories can be created through ordering, organizing and narration. Data storytelling is an integral component of a well-rounded data science education, which complements foundational skills like quantitative reasoning and programming. Despite its significance, solid understanding of the theory and practice of developing data storytelling competency is lacking. Data storytelling is often perceived as a mythical process where quantitative information magically transforms into compelling narratives. Designing scalable coaching tools for data storytelling requires leveraging multidisciplinary expertise from learning science, computer science, data science, communication science, and human-centered design. In this workshop, we will share some initial findings and reflections from our interdisciplinary team searching for effective coaching methods and tools to support coaching data storytelling at scale. We will present results from literature reviews and expert interviews which will be packaged into a set of foundational tools such as mental model, cognitive processes and schema for story construction, assessment strategy, as well as preliminary ideas of tools to support data storytelling coaching. We hope to use this workshop to build a community of researchers and practitioners in coaching data storytelling in postsecondary formal and informal learning context.more » « less
An official website of the United States government
